The landscape of facial processing applications in the context of the European AI Act and the development of trustworthy systems

This work focuses on facial processing, which refers to artificial intelligence (AI) systems that take facial images or videos as input data and perform some AI-driven processing to obtain higher-level information (e.g. a person’s identity, emotions, demographic attributes) or newly generated imagery (e.g. with modified facial attributes). Facial processing tasks, such as face detection, face identification, facial expression recognition or facial attribute manipulation, are generally studied as separate research fields and without considering a particular scenario, context of use or intended purpose. This paper studies the field of facial processing in a holistic manner. It establishes the landscape of key computational tasks, applications and industrial players in the field in order to identify the 60 most relevant applications adopted for real-world uses. These applications are analysed in the context of the new proposal of the European Commission for harmonised rules on AI (the AI Act) and the 7 requirements for Trustworthy AI defined by the European High Level Expert Group on AI. More particularly, we assess the risk level conveyed by each application according to the AI Act and reflect on current research, technical and societal challenges towards trustworthy facial processing systems.


Facial datasets analysis
Supplementary Table 1 provides the main characteristics of most popular and/or state-of-the-art facial datasets, including the facial processing tasks for which they are used. Top rows are datasets publicly released, while the last two are private and owned by the Internet giants Google and Facebook.

Most popular evaluation metrics in facial processing
Supplementary Figure 2 compiles and illustrates most widely used evaluation metrics that can be found in facial processing benchmarks and leaderboards, and indicates the computational tasks to which they are applied. As it can be observed, these metrics are mostly accucary-based and include: overall accuracy, confusion matrix, precision, recall or True Positive Rate (TPR), False Positive Rate (FPR), F1 score, Receiver Operating Characteristic (ROC) curves and Normalized Mean Error (NME).
In the case of face tracking (FT), performance is further evaluated over time in terms of tracking trajectory fragmentations and identity switches, but still following the classic accuracy concept of error assessment with respect to manual ground truth. Facial attribute manipulation (FAM) is the only exception to the accuracy concept as the objective in this case is the generation of credible (i.e. photorealistic) imagery that falls within the distribution of real images. This is usually assessed using the Fréchet Inception Distance (FID) 19 .

Supplementary Figure 2.
Compilation of most commonly reported metrics in facial processing benchmarks and leaderboards. The illustration of fragmentation and ID-switches is adapted from 20 .
Implementation example: unconstrained large-scale face identification at the edge Supplementary Figure 3 illustrates a possible edge computing architecture which could be used to deploy a remote large-scale face identification system, such as the one considered in use case BI8. It follows a distributed computing paradigm, where a scalable number of edge nodes installed on-premise extract facial snapshots and corresponding biometric templates from live video streams. A central software installed at a control room (e.g. a server located at the Police office) orchestrates the information coming from all edge nodes and performs facial matching. Generated identification alarms can be finally sent to on-premise law enforcement agents (e.g. to their smartphones or tablets) so that they undertake necessary actions.
Supplementary Figure 3. Example of edge computing architecture for remote face identification from multiple geographic locations which could be use for the implementation of use case BI8 "Unconstrained face identification". Police drawings are courtesy of Pixabay (https://pixabay.com).

Detailed information on identified facial processing applications
This supplementary section provides further information on the 60 use cases presented in the main paper. More particularly, each identified application is described in detail and accompanied by reference papers, corresponding facial processing tasks and key companies having products or having participated in related real deployments. Applications are divided into four separate tables, depending on the type of AI system they implement according to the AI Act: • Biometric Categorisation (BC). Supplementary Table 2 presents 7 use cases using categorisation systems from facial images.
• Biometric Identification (BI). This category includes applications implementing "remote", "non-remote", "real-time" and "post" BI systems. A total of 20 facial BI use cases are presented in Supplementary Table 3.
• Emotion Recognition (ER). A total of 18 facial emotion recognition applications are enumerated in Supplementary  Table 4.
• Other (OT). Supplementary Table 6 depicts 15 other facial analysis applications that do not belong to the categories above.
Note that, for the sake of clarity, a 3-letter code has been used in the tables to refer to companies. The correspondence between the name of each company and its code can be found later in Section List of key companies on facial processing, together with other relevant information about the company, namely its headquarters country, website URL and size (SME or large).

BC1 Demographic analysis
Extraction and aggregation of demographic statistics (age, gender, race) to obtain customers or visitors profiles (e.g. percent of young females visiting a shop or museum during the weekend). The cameras can be located on the counters of a store, in a storefront (e.g. embedded in mannequins), at the entrance of a museum, etc. 21,22 FD + FT +  FAE   BIC, MKT,  TOU   ISS, HER, VIV, UNI, SCV, ETI,  QUA, SIG, HBI, MEG, COG, VIX,  ROC, TOS, GOR, RAI, EVO, AGV,  VOC, VAT, KED, RNE, 3DI, AMA,  ALM, FEL, OMR, PAC, VTE, CLA   BC2 Person search by facial appearance From a large collection of video footage or images, automatic search for persons fulfilling a certain facial description (e.g. asian female wearing hat or middle-age bald white men with glasses). These tools are used, for instance, by broadcasters to retrieve specific multimedia contents, or by LE bodies to search for specific person profiles as described by witnesses. 23 Automatic assessment of pain intensity (rather than asking for subjective feelings) and its long-term monitoring through facial expression analysis. These clinic studies have attracted the interest of pharmaceutical companies (e.g. pain killer vendors), medical insurance companies, etc. It is also very useful to assess pain in patients that cannot communicate (e.g. due to Alzheimer disease). [165][166][167] FD + AU (+FER) CLI PCK, BSA

ER17 Police interrogations
Analyse facial behavior and subtle reactions (facial micro-expressions), e.g. during police interrogations, court trials or border control interviews, in order to detect potential deception or affective states. 168  Transcript or enhance speech contents in videos using ALR when the audio channel is not available or highly noisy, e.g. in video-surveillance videos or multimedia contents. It might also be used as a communication support system for the hearing impaired. 218 Key companies on facial processing

List of companies
This section contains the list of 183 companies identified as key players in the field of facial processing, as of December 2021 and in alphabetical order. Each company has been assigned a 3-letters code for easier identification. Other information provided is the country of headquarters, main web page URL and company size (SME vs large).

Distribution of companies by size and country
Supplementary Figure 4 shows the distribution of companies by size (SME vs large) and country of headquarters. As it can be observed, companies are mostly located in the USA (59 companies), Europe (44 companies in EU27, 14 in the UK and 4 in Switzerland) and Asia (20 companies in China, 14 in Japan, 4 in South Korea, 3 in Hong Kong). India, which is positioned high on global charts tracking AI industry development and technology adoption 234 , has however a comparatively small presence in facial processing (3 companies).
Overall, although large companies are present in this landscape, the majority of facial processing-related companies are SMEs (66%). Nevertheless, the ratio of large companies vs SMEs is different depending on the geographic location. USA has, in absolute terms, a stronger presence of large firms (23 out of 59), followed by China (12 out of 20) and Japan (12 out of 14). Europe's facial analysis industry is mostly dominated by SMEs (35 out of 44 companies are SMEs in EU27). With 14 identified companies, the UK makes up a large part of the the European trend, where all of them are SMEs. Figure 4. Distribution of companies per country, size (SME vs large) and type of AI system commercialised (Biometric Identification -BI, Biometric Categorisation -BC, Emotion Recognition -ER or Other -OT). Countries marked with asterisk (*) belong to the EU27, which groups the 27 member countries of the European Union 235 . The world map was generated using the Folium v0.12.1 Python library (https://python-visualization.github.io/folium/).